Overcoming attenuation bias in regressions using polygenic indices

Measurement error in polygenic indices (PGIs) attenuates the estimation of their effects in regression models. We analyze and compare two approaches addressing this attenuation bias: Obviously Related Instrumental Variables (ORIV) and the PGI Repository Correction (PGI-RC). Through simulations, we show that the PGI-RC performs slightly better than ORIV, unless the prediction sample is very small (N < 1000) or when there is considerable assortative mating. Within families, ORIV is the best choice since the PGI-RC correction factor is generally not available. We verify the empirical validity of the simulations by predicting educational attainment and height in a sample of siblings from the UK Biobank. We show that applying ORIV between families increases the standardized effect of the PGI by 12% (height) and by 22% (educational attainment) compared to a meta-analysis-based PGI, yet estimates remain slightly below the PGI-RC estimates. Furthermore, within-family ORIV regression provides the tightest lower bound for the direct genetic effect, increasing the lower bound for the standardized direct genetic effect on educational attainment from 0.14 to 0.18 (+29%), and for height from 0.54 to 0.61 (+13%) compared to a meta-analysis-based PGI.


Supplementary Methods 1: Simulations using GNAMES
In this section, we describe the outline of our simulations. For this purpose we developed a tool called GNAMES (Genetic Nurture and Assortative Mating Effects Simulator) available on https://github.com/devlaming/gnames. General usage and technical details (in terms of computational efficiency, indexing, etc.) are described in the GitHub repository.

Initialisation
We start with a population comprising N unrelated founders, for which we observe data on M independent SNPs. Let G i j denote the genotype of founder i at SNP j. G i j is drawn from Binom (2, f j ), where f j denotes the frequency of the coded allele for SNP j. Each allele frequency is sampled independently from a Beta (0.35, 0.35) distribution until it lies in [0.1, 0.9].
We draw an M × 1 vector of direct SNP effects β and an M × 1 vector of parental SNP effects γ (i.e., genetic nurture) as follows: where N (·) denotes the Normal distribution, D denotes a diagonal matrix, for which element j, j equals (2 f j (1 − f j )) −1 , and where h 2 denotes the panmictic SNP-based heritability and n 2 the panmictic proportion of variance accounted for by genetic nurture effects. These SNP effects have the following properties: • rarer variants have bigger effects than common variants, precisely to such an extent that a common and a rare SNP, on average, explain the same proportion of variation in the outcome, which aligns with SNP-effect assumptions made in, for instance, GREML analyses 1 ; • for both the direct SNP effects and the genetic nurture effects, we assume complete polygenicity (i.e., all SNPs affect the outcome); • genetic nurture and direct SNP effects are independent; • SNP effects remain constant across generations (relative to each other; the overall scale may change to maintain target heritability).

Simulating phenotypes subject to genetic-nurture effects
In a given Generation t, comprising N individuals, let G denote the N × M matrix of raw genotypes (i.e., count data between zero and two) and let G P (resp. G M ) denote the matrix of paternal (maternal) genotypes. The equations for the direct genetic component, the genetic nurture component, and the unique environment component can therefore be written as follows: n * = (G P + G M ) γ, and (4) ε * ∼ N (0, I) .
For the founders (i.e., Generation 0) we set n * ∼ N (0, I). Under random mating, g * has a variance equal to h 2 and n * a variance equal to n 2 -these quantities are not expected to change from one generation to the next. Under assortative mating (AM), we have the same variances in Generation 0. However, in that case, the variance of g * and of n * are both expected to increase over generations, evolving towards the equilibrium heritability (i.e., h 2 ∞ > h 2 ) and the equilibrium contribution for the genetic 3/26 nurture component (i.e., n 2 ∞ > n 2 ). Therefore, to keep 'contemporary' h 2 and n 2 fixed across generations (in order to have a clear benchmark in our simulations), by default, GNAMES empirically standardises these components to have mean zero and unit variance in each generation, yielding g, n, and ε. Because of this standardisation, it holds that: GNAMES also has an option to disable the per-generation standardisation. In that case, outcome y in each generation is generated as: We use this option in Supplementary Information 3.3.

Assortative mating
In a given generation, we match pairs of individuals based on their value for the phenotype (i.e., we assign pairs of individuals to be mates). We follow the matching procedure as proposed by Border et al. 2 . More specifically, individuals are matched such that their phenotypic correlation equals the desired level ρ AM . This parameter is such that ρ AM = 0 corresponds to random mating whereas ρ AM = 1 correspond to assortative mating (AM) of such a degree that mates have a perfect phenotypic correlation.
We set up this procedure such that first-degree relatives in a given generation (i.e., siblings) cannot be matched. To this end, we assign Sibling 1 from each nuclear family to Group 1 and Sibling 2 from each nuclear family to Group 2, etc. Next, we apply the matching procedure by Border et al. 2 within each group. The phenotypic correlation between mates then still equals the desired level ρ AM , even across groups (as borne out by our simulations).
Each mating pair has C = 2 children. Thus, if Generation t comprises N individuals (assuming N is an even number), there are N/2 mating pairs, yielding CN/2 = 2N/2 = N children in total. Hence, in Generation t + 1 there are also N individuals.

Partitioning data
We consider T = 10 generations. That is, we start at founders (t = 0), after which we have 10 generations of matching and mating. Assuming N is an even number, by the end of the simulation we have data on 1 2 N nuclear families. Per nuclear family, we have data on C = 2 siblings.
We partition the sibling data into three parts: • GWAS Sample 1: Consider N GWAS nuclear families, and per nuclear family consider only one of the two siblings for GWAS 1 (omitting parents).
• GWAS Sample 2: Consider N GWAS nuclear families, and per nuclear family consider only one of the two siblings for GWAS 2 (omitting parents).
• Prediction Sample: Consider 1 2 N prediction nuclear families, and per nuclear family consider both siblings (omitting parents), yielding a total prediction sample size of N prediction (also assuming N prediction is even).
These three samples are non-overlapping. By partitioning the data as such we can perform between-family GWASs and use the resulting SNP-effect estimates to construct PGIs that we can use for between-and within-family prediction. The sample size of both GWAS 1 and GWAS 2 is equal to N GWAS here. The sample size of the meta-analysis equals N MA = 2N GWAS . Finally, the sample size for prediction equals N prediction . 4/26

Imperfect genetic correlation
To investigate how imperfect genetic correlations between discovery and prediction samples impact the benchmark (OLS regression on the meta-analysis PGI), ORIV, and the PGI-RC, GNAMES also simulates a secondary trait y * according to the following set of equations: in case of per-generation standardisation, and where g * (resp. n * and ε * ) is standardised to have mean zero and unit variance, denoted by g ( n and ε). In these equations, where Under these definitions, in GWAS Sample 2 and in the Prediction Sample we may consider y while in GWAS Sample 1 we consider y * . In this case, the imperfect genetic correlation exists between one GWAS sample versus the other, while the prediction sample has a perfect genetic correlation with one of the two GWAS samples. Similarly, GWAS Sample 1 and GWAS Sample 2 may both consider y (i.e., both GWASs and the meta-analysis consider y), while the target trait in the prediction sample is y * . In this case, the imperfect genetic correlation exists only between prediction samples on the one hand, and GWAS samples on the other.

Simulation designs
We use various simulation designs to generate the results in the main paper. Below, we present each design with its input parameters. In addition to specifying parameter values, we also provide a link to the code for one run of that particular simulation design, and the corresponding Figure or Table in the paper. In a design where we fix ρ G = 1, this is simply short-hand notation for using y both in GWAS 1, GWAS 2, and in the prediction sample (i.e., the secondary phenotype, y * is ignored altogether in this case).

Meta-analysis as implemented in GNAMES
In our simulations using GNAMES, we pool the data used in the two main GWASs, and use this pooled data for a third GWAS, which we refer to as a meta-analysis. Although the latter technically is not a meta-analysis, we here show that this pooled simple GWAS is equivalent to a meta-analysis. As such, we can safely refer to the pooled GWAS as a meta-analysis within the context of our simulations.
We assume there are no control variables to take into account. Moreover, we assume both the SNP and outcome have been standardised to mean zero and unit variance, and we assume a given SNP individually to explain only a negligible proportion of the variation in the outcome. Consider SNP x and outcome y (both N × 1 vectors). We split the sample in two chunks with size N 1 and N 2 , such that N = N 1 + N 2 . Let i = 1, 2 denote an index for the two corresponding subsamples.
The GWAS estimate, for the given SNP, in Subsample i = 1, 2 can be written as follows: where x i (resp. y i ) denotes the SNP (outcome) vector in Subsample i, and where the last equality holds because SNPs have been standardised (i.e., 1 N i x ⊤ i x i = 1). Moreover, the GWAS z-score in Subsample i can be written as where σ 2 y i is the phenotypic variance in Subsample i not accounted for by the given SNP. As SNPs are standardised, we have that 1 N i x ⊤ i x i = 1. Moreover, as the SNP explains hardly any variation in y, we have that σ 2 y i ≈ 1. Therefore, we can approximate the right-hand side in Supplementary Equation 18 as follows: Moreover, under this standardisation the following approximation of the SNP-effect estimate as function of its z-score also holds: The sample-size weighted meta-analyses z-score 3 is given by

7/26
which for the case of two subsamples reduces to Using Supplementary Equation 19, the meta-analysis z-score can be approximated as follows: Moreover, using our approximation for β as a function of its z-score in Supplementary Equation 20, we can approximate our meta-analysis SNP-effect estimate as a function of the meta-analysis z-score: where N = N 1 + N 2 . Combining Supplementary Equations 23 and 24 yields that the meta-analysis SNP-effect estimate can be approximated as follows: Now, consider the SNP-effect estimate by applying ordinary least squares (OLS) to pooled data directly (instead of using meta-analysis). This OLS estimate equals: where x ⊤ 1 x 1 = N 1 and x ⊤ 2 x 2 = N 2 by virtue of aforementioned SNP standardisation. Thus, this expression reduces to Similarly, we also have that Z pooled ≈ Z MA . This finding aligns with the work by 4 , showing that there are no gains in statistical efficiency by analysing pooled data in a GWAS instead of using meta-analysis.
To illustrate the equivalence, we simulate SNP and outcome data, which we then split into two subsamples. We perform a GWAS in each subsample as well as in the pooled data. Finally, we meta-analyse results from the two subsamples, and compare these to results from the GWAS on pooled data. Supplementary Figure 1 shows pooled GWAS z-scores versus meta-analysis z-scores. The two sets of z-scores are virtually indistinguishable; they have the same scale and an R 2 of 99.99%. Therefore, in the context of GNAMES, we can talk about a pooled GWAS and meta-analysis interchangeably.

Correlation between direct and indirect genetic effects
In our simulation design with both genetic nurture (GN) and assortative mating (AM) there is an emergent correlation between Similarly, TT mating with TT always yields TT offspring. Also, AT individuals mating with AT individuals yield AA offsprings in 25% of the cases, TT also 25%, and AT in 50% of the cases. This illustrates (rather informally) a well-known result: under AM, homozygotes tend to become more prevalent for variants affecting the mating trait. In short, such a SNP will start violating Hardy-Weinberg Equilibrium (HWE) across generations, due to heterozygotes being 'underrepresented' and homozygotes being 'overrepresented'. Now imagine we have two SNPs, starting in HWE amongst the founders, both affecting the mating trait via direct and indirect effects. When the direct and indirect effects of a given SNP reinforce each other, by having the same sign, such a SNP will tend to violate HWE more quickly than a SNP for which the direct and indirect effect cancel out to some degree, by having a different sign. As the direct and indirect effect are independently drawn, such reinforcement or cancelling out depends purely on chance. However, the SNPs with aligned direct and indirect effects will violate HWE more quickly and, hence, have an inflated variance more rapidly than SNPs for which there is some cancelling out of the direct and indirect effect.

True coefficient when genetic nurture effects are present
Here, we derive the expected ORIV coefficient in the absence of assortative mating (AM), but in the presence of genetic nurture (GN), in case the direct SNP effects (δ ) and the parental genetic nurture SNP effects (ν) have a non-zero correlation, here denoted by ρ δ ,ν . In this case, the expected coefficient is a function of the correlation ρ δ ,ν . This correlation has a simple 9/26 relationship with the correlation between the total direct and indirect component of the trait, denoted by ρ g,n . As the combination of genetic nurture (GN) and AM may result in ρ g,n ̸ = 0 even when ρ δ ,ν = 0 (as discussed in Appendix 1.8), we posit that this formula can also be used to give a precise expectation for the ORIV coefficient in that specific scenario. Results from our simulation support our claim that this expectation is indeed accurate.
The data generating process when SNP effects are correlated In the simulation of phenotypes subject to genetic nurture, we followed the simple linear model described in Supplementary Equations 3-6, where the SNP effects were drawn from the distributions outlined in Supplementary Equations 1 and 2. In those equations, scaling of the direct genetic component g by h 2 and the indirect component n by n 2 was not done directly via the random SNP effects. Instead, the direct and indirect components were first empirically standardised to mean zero and unit variance, after which they were assigned the appropriate weights in the equation for the phenotype. This process is asymptotically equivalent to formulating y i (i.e., the phenotype for individual i) as follows: where x i j denotes the standardised (instead of raw) SNP data for individual i at SNP j. Similarly, p i j and m i j denote the standardised paternal and maternal SNP data for the same individual at the same SNP. δ j denotes the direct effect of standardised SNP j, and ν j the paternal and maternal genetic nurture effect of that standardised SNP. In case ρ δ ,ν = 0, this formulation (i) still ensures the variants jointly explain h 2 in terms of the direct component and n 2 in terms of the indirect component, and (ii) is asymptotically equivalent to the design presented in Supplementary Equations 1-6. From this model, it follows that the phenotypic variance accounted for by SNP j equals: Given SNP effects are uncorrelated with the SNP data, we can rewrite this contribution as: Recognising that x i j , p i j , and m i j are mean-centred, the expectations of these random variables (RVs) are equal to the variances (which all equal one, due to standardisation). In addition, we have that E δ 2 is the covariance between the offspring and paternal genotype, which equals the correlation, given standardisation, which equals 0.5. Similarly, E [x i j m i j ] = 0.5. Finally, observe that by the assumption of no AM, we have that E [m i j p i j ] = 0. Thus, the phenotypic variance accounted for (VAF) by SNP j is given by

10/26
Therefore, aggregating across M SNPs, we have that the phenotypic VAF of all SNPs jointly (in terms of both the direct effect, indirect effect, and their covariance), is given by Relationship between ρ δ ,ν and ρ g,n Let ρ g,n denote the correlation between the total direct and indirect component for As this correlation is the same per SNP, the summation can be ignored without loss of generality. We can now rewrite this correlation as follows: Conversely, we have that ρ δ ,ν = √ 2ρ g,n . This result shows that the correlation between the direct and indirect component is always smaller than the correlation between the underlying SNP effects in terms of the direct contribution and the contribution via the parental genotypes (i.e., effects δ j and ν j ). This relationship is intuitive: the total correlation between the direct and indirect component is constrained by two things: imperfect correlation between parent-offspring genotypes (i.e., equal to 0.5 both on the paternal and maternal side) and a given correlation between the SNP effects δ j and ν j . , the variance accounted for by all SNPs jointly can be written as follows: As a result, the phenotypic variance of Y is now given by:

11/26
That is, the variance of Y is different from one when ρ g,n ̸ = 0. This implies that the SNP-based heritability and the proportion of variance accounted for by GN are no longer h 2 and n 2 in this case, but rather: Only when ρ g,n = 0 can h 2 indeed be directly interpreted as the SNP-based heritability (and n 2 interpreted analogously for the nurture component) in our simulations.
Expected coefficient Relying on asymptotic arguments, we can turn our attention to the expected coefficient in case Assuming the data-generating process (DGP) in Supplementary Equation 30 holds true, and writing this DGP in vector notation, we have that where bold-faced letters here denote N × 1 vectors. Now, observe that, under aforementioned standardisation of SNP data and the absence of linkage disequilibrium, the GWAS estimator for SNP j in a sample of N unrelated individuals can be written as follows: where ω j is the estimation error in the parent-offspring correlations in genotypes (which is negligible for large N) and where η j is the GWAS estimation error that ORIV addresses by using a PGI constructed using estimates from one GWAS sample as instrument for a PGI constructed using estimates from another GWAS sample. In other words, for large GWAS sample sizes, ORIV manages to separate the true contribution of δ j + ν j from the contribution of η j due to estimation error.
Consider an individual i in the hold-out sample with standardised genotype x i j at SNP j. The contribution of that SNP towards the PGI for that individual that ORIV distills is given by On the other hand, the real contribution of that SNP (also in terms of genetic nurture contributions) towards the phenotype of that individual is given by

12/26
Hence, we can quantify the phenotypic VAF by ORIV at locus j as follows: Cov Aggregating across all M SNPs, ORIV will, therefore, tag the following part of the phenotypic variance: As a result, the expected coefficient from the true regression under a correlation between the direct and indirect component ρ g,n ̸ = 0 is given by: Since a positive value for ρ g,n in our simulations arises naturally in a scenario with AM and GN (we did not fix ρ g,n ourselves), Supplementary Table 1 shows a summary of the simulation results. While the PGI-RC results are biased upwards in the presence of assortative mating, the ORIV estimates are not. When both genetic nurture and assortative mating are present, the target parameter changes due to the arising positive correlation between direct and indirect genetic effects. In this case, ORIV continues to approximate the target parameter very well, whereas the PGI-RC remains biased. These results are graphically presented in Figure 4c in the main text.

16/26 3 Supplementary Results 2: Robustness Simulation Results
In this section we describe the Supplementary Results regarding the robustness of our simulation results.

SNP
In our main set of simulations we set the SNP-based heritability to 25%, corresponding to β st = 0.5. Here, we seek to investigate the generalizability of our simulation findings to different levels of SNP-based heritability. In particular, we decrease the SNP-based heritability to 0.15 (i.e., roughly the SNP-based heritability of EA in our empirical analysis) and to 0.5 (i.e., roughly the SNP-based heritability of height in our empirical analysis). In investigating scenarios, for comparability, we maintain the same ratio of the R-squared (R 2 ) to the SNP-based heritability (h 2 SNP ) as we use in Table 1. That is, since EA2 roughly corresponds to a ratio of 0.16(= 0.042/0.25) and EA4 corresponds to a ratio of 0.6(= 0.15/0.25), we investigate a similar ratio for the other two levels of heritability. Supplementary Table 8 presents the results. Table 8. Estimated coefficients for the Polygenic Index (PGI) and their corresponding 95% confidence intervals using meta-analysis, Obviously Related Instrumental Variables (ORIV), and the PGI-RC procedure in the baseline scenario (no genetic nurture, no assortative mating; between-family analyses only).  Table 1, ORIV is somewhat biased for relatively small GWAS samples, while the PGI-RC is somewhat sensitive to a small prediction sample. Still, both methods clearly outperform a meta-analysis based PGI. For larger GWAS and prediction samples, the two methods both perform accurately, with little differences between them. Applying the default PGI-RC produces slightly narrower confidence intervals. Together, these simulation results show that the performance of a meta-analysis based PGI, ORIV or the PGI-RC are not driven by the absolute value of β st or the SNP-based heritability, such that our conclusions are not specific to any level of SNP-based heritability.

Sample size GWAS
The expected R-squared of a given PGI in a regression depends on the ratio of the number of SNPs over the GWAS discovery sample. Therefore, to keep the simulation space manageable we proportionally reduced both the number of SNPs as well as the GWAS discovery sample to generate realistic values for the R-squared of the PGIs. In order to verify that this shortcut did not meaningfully affect our results, we conducted a simulation in a scenario with genetic nurture where we increased the number of SNPs from 5,000 to 100,000 and the GWAS discovery sample to 100,000. We hold the discovery sample fixed at N = 16, 000.
In turn, we vary the level of assortative mating (AM = 0 or 1).
Supplementary Figure 2 shows a design without assortative mating, and a design with assortative mating of 1, respectively.
As can be seen, when holding constant the GWAS discovery sample size, and without assortative mating, Supplementary Figure   2a shows that both ORIV as well as the PGI-RC produce a consistent estimator, whereas a meta-analysis PGI is substantially biased. In case of strong assortative mating, Supplementary Figure 2a again shows that the PGI respository correction is more vulnerable than ORIV in overestimating the true coefficient when the outcome is subject to assortative mating. Supplementary   Figure 2b shows that ORIV only slightly underestimates the direct genetic effect within-families irrespective of the level of assortative mating. Overall, the same patterns arise as in our benchmark scenarios, suggesting that the shortcut to proportionally reducing the number of SNPs and the GWAS sample size does not meaningfully affect our conclusions.

No standardisation
We designed our baseline simulations such that the true coefficient does not change with increasing levels of assortative mating (AM). To this purpose (as discussed in Supplementary Information 1), we standardise the true PGI to have mean zero and unit variance in each generation of the forward simulation, and then assign effect √ h 2 to this standardised PGI, when drawing Y as a linear function of this PGI and the other terms in the model. This approach keeps the 'contemporary' heritability fixed (with the exception of the design in which AM and genetic nurture are combined, where the ensuing correlation between the true PGI and nurture component confounds our true coefficient, see Supplementary Information 1.8). Another way to think about our standardisation is that, under AM and the resulting excess homozygosity rates, inflation in the variance of the PGI is avoided by shrinking all SNP effect sizes by the same factor in a given generation. This approach simplifies the formulation of the true coefficient and, hence, gives a clear and intuitive benchmark in all simulation scenarios (again, with the exception of the design in which AM and GN are combined, see Supplementary Information 1.8).
Here, we analyse an alternative implementation without standardisation in each generation. In this scenario, the true coefficient evolves over generations, as the panmictic heritability (i.e., h 2 0 in the notation of Border et al. 2 ) evolves towards the equilibrium heritability (i.e., h 2 ∞ ). To the best of our knowledge, the 'contemporary' heritability in a given generation cannot be expressed as a simple function of h 2 0 , h 2 ∞ , and the number of generations passed since the population switched from random mating to AM. Hence, there is no tractable expression available for the true coefficient in that case. Instead, in the simulations without per-generation standardisation in the DGP, we regress the outcome on the true PGI in the prediction sample to obtain an accurate and unbiased estimate of the true between-family coefficient. In addition, to estimate the true within-family coefficient in that scenario, we use the same regression, but then also controlling for family-specific fixed effects.
Supplementary Figure 3 presents the results of these simulations. The true standardized coefficient increases with increasing levels of assortative mating in the absence of standardisation. Still, as in the baseline simulations, ORIV provides consistent estimates of the true coefficient, whereas the PGI-RC overestimates the true coefficient when assortative mating is present. Within-family analyses. Data are presented as the estimated coefficients +/-1.96 times the standard error (95% confidence interval) for the Polygenic Index (PGI) based on between-family analyses (left-panel) and within-family analyses (right-panel) using meta-analysis (circles), Obviously Related Instrumental Variables (ORIV, rhombuses), the default PGI-RC procedure (squares), and the PGI-RC procedure taking into account uncertainty in the GREML estimates (triangles) in a scenario without standardisation of the genetic contributions towards the phenotypes in each generation, without genetic nurture, and with varying levels of assortative mating. The simulations hold constant the Genome-wide Association Study (GWAS) discovery sample such that the resulting meta-analysis PGI has an R 2 of 15.4%, and the prediction sample size is held fixed at N = 16, 000. The dashed line represents the true coefficient, which increases with higher levels of assortative mating in the absence of standardisation in each generation. The simulation results are based on 100 replications.

20/26 4 Supplementary Methods 2: Data Empirical Analysis
The empirical analysis uses siblings in the UK Biobank (UKB) 5 as a prediction sample for constructing the polygenic indices (PGIs). The GWAS discovery sample excludes these full siblings and their relatives up to the 3rd degree to ensure that the discovery and the prediction samples are independent from each other. The analyses were restricted to individuals of European ancestry only. For this purpose, we excluded individuals with a value larger than 0 on the first principal component of the genetic data. We also filtered based on the self-reported ethnicity. Specifically, we only include individuals if their ethnic background is "White", "British", "Irish", or "Any other white background".

Relatedness
The UKB provides a kinship matrix, which contains the genetically identified degree of relatedness for pairs of UKB participants related up to a third degree or closer. The kinship coefficient is constructed using KING software 6  and excluded those individuals from the GWAS discovery sample along with the full siblings. This ensures that our prediction sample of siblings is unrelated to the GWAS discovery sample. Importantly, in the GWAS discovery sample there are still some related (non-sibling) individuals. For this reason, we adjust for genetic relatedness in the GWASs. In particular, we kept one random cousin from the cousin clusters before randomly splitting the samples for GWAS.

Phenotypes
We follow the literature 7 to 10 years. If "none of the above" is selected, then the lowest level of 7 years is assigned. For height, we use the standing height (in centimeters) of participants measured as a part of the anthropometric data collection at the UKB assessment center.
For both phenotypes, we impute the missing values in the first wave with the available information from the two follow up measurements in the UKB.

GWAS
We individuals who withdrew consent, with bad genotyping quality, with putative sex chromosome aneuploidy, whose second chromosome karyotypes are different from XX or XY, with outliers in heterozygosity, with missing gender or self-reported gender mismatching with the genetically identified gender, individuals of non-European ancestry, or with missing information on any of the former criteria, on the phenotype or on any of the control variables. Further, we perform quality control on the resulting GWAS summary statistics using the EasyQC tool 11 .

Meta-Analysis
We meta-analyze our own GWAS summary statistics obtained with UKB data with the 23andMe summary statistics for the educational attainment PGI and with the GIANT 2014 summary statistics 12 for the height PGI using the software package

Polygenic indices
We construct the PGIs by accounting for the linkage disequilibrium (LD), i.e., the non-random correlations between SNPs at various loci of a single chromosome, using the LDpred tool 13 , version 1.06, and Python, version 3.6.6. LDpred is a Python based package that corrects the GWAS weights for LD using a Bayesian approach. We follow steps as summarized by 14 : (i) coordinate the base and the target files, (ii) compute the LD adjusted weights, and (iii) construct the polygenic index with PLINK 15 using the LD weighted GWAS summary statistics. As LD reference panel, we use 30,000 randomly selected unrelated individuals of EU ancestry in the UKB who passed the quality control protocol. The PGIs are based on approximately 1 million HapMap3 SNPs assuming an infinitesimal model (i.e., we do not apply p-value thresholds). More details can be found in Supplementary Table 9. The final prediction sample consists of N = 35, 282 siblings.

Supplementary Results 3: Robustness Empirical Analysis
In this section we discuss the robustness of our empirical results.

Diastolic blood pressure
In Supplementary

The between-family and within-family component of the PGI
The correlation between the independent PGIs reflects the measurement error in the PGI of the between-family component.
There will be some bias in the within-family estimates if the measurement error of the within-family signal is considerably different from the measurement error of the between-family signal. We therefore computed the correlation between deviations of the PGI from the family means in the UKB, and use this as the scaling factor in the within-family analysis (we thank Elliot Tucker-Drob for this suggestion). The results were very similar compared to the main results. For EA, the within-family two-sample and split-sample coefficients were estimated as 0.179 (0.012) and 0.161 (0.012), respectively. For comparison, the coefficients are 0.184 (0.012) and 0.170 (0.013) in Table 2. For height, the within-family two-sample and split-sample ORIV coefficients were estimated as 0.604 (0.008) and 0.556 (0.009), respectively. In Table 3, the coefficients are 0.614 (0.008) and 0.571 (0.009). Full results for this robustness check are available upon request from the authors. These findings suggest that the measurement error across between-and within-family signals is comparable and does not represent a relevant threat to the application of ORIV for these traits.

Comparing ORIV with regular IV
In this section we compare the ORIV estimator to the regular IV estimator suggested by DiPrete et al. 17 for our between-family empirical analyses regarding EA (Supplementary Table 11) and height (Supplementary Notes: * p-value < 0.10; * * p-value < 0.05; * * * p-value < 0.01 of a two-sided t-test (OLS, PGI-RC) or two-sided z-test (ORIV) without adjustments for multiple comparisons. In all regressions the dependent variable is residualized diastolic blood pressure (standardized to have mean 0 and standard deviation 1), where the residuals are obtained after a regression of diastolic blood pressure controlling for sex, year of birth, month of birth, sex interacted with year of birth, and the first 40 principal components of the genetic relationship matrix. Standard errors are robust and clustered at the family level, and in case of ORIV also at the individual level. OLS (UKB 1/2) refers to models with the PGI constructed using the split-sample UKB non-sibling (i.e., excluding all siblings and their relatives) sample. OLS (Meta-analysis) uses a PGI constructed using a meta-analysis of the two split-sample UKB non-sibling samples. ORIV (Split-sample) refers to a 2SLS estimation where the summary statistics derive from a random split of the UKB sample. PGI-RC refers to the PGI-RC method, where (Default) refers to the conventional application of the method, whereas (GREML unc.) refers to the case where we incorporate the uncertainty in the estimation of the SNP-based heritability.

24/26
Supplementary Notes: * p-value < 0.10; * * p-value < 0.05; * * * p-value < 0.01 of a two-sided z-test without adjustments for multiple comparisons. In all regres-  (2)) refers to a 2SLS estimation using the split-sample UKB PGI 1(2) is used as IV for UKB PGI 2(1). ORIV (Split-sample) refers to a 2SLS estimation where the summary statistics derive from a random split of the UKB sample. Notes: * p-value < 0.10; * * p-value < 0.05; * * * p-value < 0.01 of a two-sided z-test without adjustments for multiple comparisons. In all regressions the dependent variable is residualized height (standardized to have mean 0 and standard deviation 1), where the residuals are obtained after a regression of height on controlling for sex, year of birth, month of birth, sex interacted with year of birth, and the first 40 principal components of the genetic relationship matrix. Family fixed effects are not included. Standard errors are robust and clustered at the family level, and in case of ORIV also at the individual level. IV (UKB) refers to a 2SLS estimation using PGI on the basis of the UKB instrumented by the PGI constructed using the GIANT summary statistics, while IV (GIANT) is a 2SLS regression where the roles are reversed. ORIV (2-sample) refers to a 2SLS estimation using the PGIs from the UKB non-sibling sample and GIANT as instrumental variables for each other. IV (UKB1(2)) refers to a 2SLS estimation using the split-sample UKB PGI 1(2) is used as IV for UKB PGI 2(1). ORIV (Split-sample) refers to a 2SLS estimation where the summary statistics derive from a random split of the UKB sample.